Wikipedia Ad HocPassage Retrieval and Wikipedia Document Linking
نویسندگان
چکیده
Ad hoc passage retrieval within the Wikipedia is examined in the context of INEX 2007. An analysis of the INEX 2006 assessments suggests that fixed sized window of about 300 terms is consistently seen and that this might be a good retrieval strategy. In runs submitted to INEX, potentially relevant documents were identified using BM25 (trained on INEX 2006 data). For each potentially relevant document the location of every search term was identified and the center (mean) located. A fixed sized window was then centered on this location. A method of removing outliers was examined in which all terms occurring outside one standard deviation of the center were considered outliers and the center recomputed without them. Both techniques were examined with and without stemming. For Wikipedia linking we identified terms within the document that were overrepresented and from the top few generated queries of different lengths. A BM25 ranking search engine was used to identify potentially relevant documents. Links from the source document to the potentially relevant documents (and back) were constructed (at a granularity of whole document). The best performing run used the 4 most over-represented search terms to retrieve 200 documents, and the next 4 to retrieve 50 more.
منابع مشابه
Focused Search in Books and Wikipedia: Categories, Links and Relevance Feedback
In this paper we describe our participation in INEX 2009 in the Ad Hoc Track, the Book Track, and the Entity Ranking Track. In the Ad Hoc track we investigate focused link evidence, using only links from retrieved sections. The new collection is not only annotated with Wikipedia categories, but also with YAGO/WordNet categories. We explore how we can use both types of category information, in t...
متن کاملDocument Representation and Query Expansion Models for Blog Recommendation
We explore several different document representation models and two query expansion models for the task of recommending blogs to a user in response to a query. Blog relevance ranking differs from traditional document ranking in ad-hoc information retrieval in several ways: (1) the unit of output (the blog) is composed of a collection of documents (the blog posts) rather than a single document, ...
متن کاملThe Impact of Document Level Ranking on Focused Retrieval
Document retrieval techniques have proven to be competitive methods in the evaluation of focused retrieval. Although focused approaches such as XML element retrieval and passage retrieval allow for locating the relevant text within a document, using the larger context of the whole document often leads to superior document level ranking. In this paper we investigate the impact of using the docum...
متن کاملOverview of the INEX 2008 Ad Hoc Track
This paper gives an overview of the INEX 2008 Ad Hoc Track. The main goals of the Ad Hoc Track were two-fold. The first goal was to investigate the value of the internal document structure (as provided by the XML mark-up) for retrieving relevant information. This is a continuation of INEX 2007 and, for this reason, the retrieval results are liberalized to arbitrary passages and measures were ch...
متن کاملThe Impact of Named Entity Normalization on Information Retrieval for Question Answering
In the named entity normalization task, a system identifies a canonical unambiguous referent for names like Bush or Alabama. Resolving synonymy and ambiguity of such names can benefit end-to-end information access tasks. We evaluate two entity normalization methods based on Wikipedia in the context of both passage and document retrieval for question anwering. We find that even a simple normaliz...
متن کامل